Bootstrapping Syntax and Recursion using Alginment-Based Learning

نویسنده

  • Menno van Zaanen
چکیده

This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris’s (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of the corpus. Firstly, the algorithm aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are similar in both sentences and parts that are dissimilar. This information is used to nd (possibly overlapping) constituents. Next, the algorithm selects (nonoverlapping) constituents. Several instances of the algorithm are applied to the ATIS corpus (Marcus et al., 1993) and the OVIS1 corpus (Bonnema et al., 1997). Apart from the promising numerical results, the most striking result is that even the simplest algorithm based on alignment learns recursion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping Syntax and Recursion using Alignment-Based Learning

This paper introduces a new type of unsupervised learning algorithm, based on the alignment of sentences and Harris’s (1951) notion of interchangeability. The algorithm is applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of the corpus. Firstly, the algorithm aligns all sentences in the corpus in pairs, resulting in a partition...

متن کامل

Corpus-based Learning in Stochastic OT-LFG – Experiments with a Bidirectional Bootstrapping Approach

This paper reports on experiments exploring the application of a Stochastic Optimality-Theoretic approach in the corpus-based learning of some aspects of syntax. Using the Gradual Learning Algorithm, the clausal syntax of German has to be learned from learning instances of clauses extracted from a corpus. The particular focus in the experiments was placed on the usability of a bidirectional app...

متن کامل

Meaning to Learn: Bootstrapping Semantics to Infer Syntax

Context-free grammars cannot be identified in the limit from positive examples (Gold 1967), yet natural language grammars are more powerful than context-free grammars and humans learn them with remarkable ease from positive examples (Marcus 1993). Identifiability results for formal languages ignore a potentially powerful source of information available to learners of natural languages, namely, ...

متن کامل

Syntactic bootstrapping.

Children use syntax to guide verb learning in a process known as syntactic bootstrapping. Recent work explores how syntactic bootstrapping works-how it begins, and how it interacts with progress in syntax acquisition. We review evidence for three claims about the mechanisms and representations underlying syntactic bootstrapping: (1) Learners are biased to represent linguistic knowledge in a use...

متن کامل

Early evidence for syntactic bootstrapping: 15-month-olds use sentence structure in verb learning

Infant language-learners receive input consisting of word sequences paired with world scenes. Based on these data, they start learning to understand sentences early in the second year, and ultimately build a lexicon and grammar that support broad generalization. Accounts of how they do so necessarily begin with the extra-linguistic world: The true novice, not yet knowing the words or syntax, mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000